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ABSTRACT 



This research investigates the use of a voice recognition 
system by military operators -- officer, enlisted, male and 
female. The application intended is the use of a discrete 
utterance voice recognition system in a command center environ- 
ment. The system would be used by members of a watch team to 
execute ad hoc queries against an automated data base in 
support of their command 1 center duties . The following 
factors were examined: 

-- the adaptability of a random sample of active duty 
military personnel to a voice input system. 

-- the accuracy of such a system. 

-- the effects of male versus female operators. 

-- the effects of officer versus enlisted operators. 

-- the advantages/disadvantages of using three, five 
or ten training passes to train the voice system. 

Results showed no significant difference in error rates 

between the categories of officer and enlisted nor between 

male and female. Three training passes had a slightly 

higher error rate than five or ten passes but five and ten 

passes were the same. 
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I . INTRODUCTION 



A. BACKGROUND 

1 . Voice Technology 

"It is only a matter of time until automatic speech 
recognition (ASR) becomes a major force in man-machine 
communication because of the inherent advantages of 
speech communication and our increasing need to commu- 
nicate with machines. The inherent advantages of speech 
arise from its universality, convenience, and speed." 

[Ref. 1]. 

Speech is the human's fastest and most convenient 
method of communicating and consequently little or no 
operator training is required if speech is used as the inter- 
face between man and computer. In experiments involving 
speech and other forms of machine communication (e.g., 
typing) , information is exchanged almost twice as fast with 
speech [Ref. 2]. In addition to the speed and ease of 
training, speech input frees the operators' hands and eyes 
for other tasks [Ref. 3] . 

The use of voice input to machines can be categorized 
into three modes of operation: 

-- voice response. 

-- speaker verification. 

-- speech recognition. 

VOICE RESPONSE is the area of voice input which deals 
with speech synthesis -- voice readout of computer-stored 
data. The appropriate message is selected from a stored 
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vocabulary by a synthesis program and then given to a 
synthesizer device which generates a signal for transmission 
over a voice circuit [Ref. 4] . 

SPEAKER VERIFICATION involves authenticating the 
identity of a speaker according to measurements on his voice 
signal. Applications for speaker verification systems 
include voice lock/unlock security systems and banking and 
credit transaction [Ref. 5], 

SPEECH RECOGNITION is giving commands to machines 
by voice. The machine does not have to identify the speaker, 
only "recognize" what is said. The commands can be given 
by any speaker as long as his or her voice patterns match 
those parameters for the desired stored command. Speech 
recognition systems are used for baggage and parcel sorting, 
quality control on production lines and voice direction of 
machine tools. They are typified by small word vocabularies 
spoken by a small population of users or large vocabularies 
(several hundred words) for speakers who allow the machine 
to calibrate their voices [Ref. 6] . 

The first experiments with speech input to machines 
were done in the 1950’s using vowel and digit recognition 
systems. Today there are commercially available isolated 
word recognition systems which easily handle small vocabularies 
from a known set of speakers. Actual systems in use today 
include United Air Lines baggage handling system, Ford 
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Motor Company's assembly line inspection of cars and Union 
Carbide's nuclear products manipulation system at Oak Ridge 
and Lockheed's quality control inspection line in Sunnyvale, 
California . 

There are two features which characterize the 
complexity of the speech recognition task: 

-- whether the speech is connected or spoken one word at 
a time . 

-- the size of the vocabulary. 

In connected speech the acoustic characteristics of sounds 
and words have greater variability. In addition, it is 
difficult to determine where one word ends and the next 
begins . As the number of words in the vocabulary and the 
number of different contextual variations per word increase, 
the storage required to store all reference patterns becomes 
enormous . 

The principal difficulty in automatic speech recog- 
nition is not due to a lack of speech understanding but to 
the massive amount of memory and time required to store and 
process the required data. Recent progress has been limited 
more by advances in data processing than in speech recognition 
technology [Ref. 7] . 

Therefore, a major disadvantage of speech recognition 
systems is the requirement for large amounts of memory and 
processing time. Some additional problems are: 

-- speaker variability due to sex and dialect makes 
recognition very difficult. 
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-- speech communication is not private. 

-- speech communication may be subject to environmental 
noise and distortions. 

-- voice input is expensive in comparison to other 
input/output devices. (The cost of voice input 
devices ranges from $200 to $80,000 which includes 
a wide variety of capabilities.) 

In spite of these restrictions, applications for 
voice systems today include several areas: 

a. voice readout of numerals. 

(1) telephone numbers. 

(2) assembly of equipment. 

(3) stock price quotations . 

(4) inventory reporting. 

(5) automatic directory assistance. 

b. industrial applications. 

(1) special purpose computer programming for machine 
tools . 

(2) quality control inspection systems. 

(3) equipment handling and sorting systems. 

c. editing of financial information. 

This thesis will address another application for today's 
voice recognition systems -- that of command and control. The 
implication here is not command and control in the sense of 
voice communication with machines but in the military appli- 
cation of a management information system which provides 
data on resources available. 
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2 . Command, Control and Communications (C3) 



In 1972 the Honeywell 6000 computer (H6000) was 
installed at Commander in Chief Naval Forces Europe 
(CINCUSNAVEUR) in support of the World Wide Military 
Command and Control System (WWMCCS) . The H6000 transferred 
CINCUSNAVEUR from the first generation of computer systems -- 
characterized by card decks and single job processing -- to 
the third generation of multiprogramming, timesharing and 
terminal input/output. What existed at CINCUSNAVEUR in 
the way of "computer support" prior to the H6000 was a very 
"user unfriendly" ANYUK computer which required a great 
deal of expertise and very specific procedures to operate. 

Consequently, when the H6000 was installed, the staff, 
conditioned by the difficulties of using the prior data 
processing equipment, was very reluctant to have a computer 
replace their filing cabinets. After several years of 
software changes, updates to the Navy WWMCCS Software 
Standardization System (NWSS) were being passed from the 
fleet by AUTODIN to the H6000. Messages were not manually 
manipulated unless they were kicked out of the system because 
of errors. 

In spite of the fact that inputs to the database were 
being electrically transmitted from AUTODIN to the H6000 
before the communication center could distribute the paper 
copy, the staff, for the most part, avoided the NWSS query 



13 



module and held to their filing cabinets. Training sessions 
given by the software developers on how to use NWSS were not 
well attended. User reaction to the system was so negative 
that a separate shop for monitoring the database and correct- 
ing the error messages had to be formed using ADP resources. 
That is, the users who were supposed to be responsible for 
data content passed the responsibility off to the data 
processors . 

In 1978, a preliminary evaluation of the man-machine 
interface of the NWSS query module was done by Naval Ocean 
Systems Center [Ref. 8]. The reason for the study was to 
investigate the possibility of simplifying the query module 
since the module, while it is very powerful, is also rather 
confusing to the infrequent user. There are nonstructured 
query systems being tested on data bases similar to NWSS -- 
LADDER, for example -- which would provide the user with 
a much easier access to the data. LADDER (Language Access 
to Distributed Data Bases with Error Recovery) will allow a 
user to ask the computer a question in plain English (Where 
is the Kennedy?") instead of requiring a specific format and 
specific command words. The free format LADDER query system 
has been in test and development status since 1977. 

But let’s take it a step further. Even if a 
relatively free format query system was available from NWSS, 
chances are a good percentage of the staff would still not 
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be interested -- because it still requires the user to sit 
in front of a terminal and find characters which are 
randomly spread over the keyboard. (Would Star Trek ever 
have been so popular if Captain Kirk had to wheel up to 
a keyboard and begin typing instead of just facing the panel 
and speaking into it?) If using the NWSS query module was 
as easy as loading a tape of voice patterns and "speaking" 
the query to the computer, would there be less reluctance 
on the part of the staff and command center team to use the 
automated data base instead of going to the files? 

The problem of C3 today is significantly more complex 
than at any time in the past. To be competitive in today's 
automated world, some extension of man's memory and compu- 
tational abilities is needed. How can this capability be 
provided without requiring an excessive amount of training? 

Is it possible to provide a computer tool without requiring 
typing skills to use it? 

The easier it is to access the data, the more likely 
the staffer ^vill be to use it. The easiest way for a nondata - 
processor to interface with a computer is simply to talk to 
it. Consideration for the use of a voice interface with the 
automated information system would include such questions 
as : 

Is it feasible to utilize a voice recognition system 
in an environment such as a command center where each 
member of the watch team could query the computer by 
voice? 



15 



Is it cost effective to train a military member 
to use a voice recognition system and could it be 
done in a negligible amount of time? 

Would voice input in terms of today's technology 
be adaptable for female as well as male usage? 

What are the tradeoffs in using three, five or 
ten training passes in terms of training time, error 
rates and user psychology? 

Would it be feasible in terms of system resources 
to store voice patterms for every member of the watch 
section on the computer? 

Would stress vary the voice patterns to such an 
extent that the voice input system would be unacceptable 
in the varying stress situations of the command center 
environment ? 

With these thoughts in mind, this thesis investigates the 
use of a voice recognition system by military operators -- 
male, female, officer, enlisted -- from technical and non- 
technical backgrounds. 



B. OBJECTIVES 

The objective of this thesis was to explore the use of 
a voice recognition system by a random sample of active duty 
military personnel. Specifically, to determine the effective- 
ness of such a system in each of the following three cases: 

1. Male Operators versus female operators: 

The female voice generally has a higher pitch than 
the male voice due to the spread of the harmonics in 
the frequency spectrums of the female. This factor 
causes problems in frequency resolution and conse- 
quently the female voice has been particularly hard 
for machines to recognize [Ref. 9]. There has been 
very little work done with female subjects and voice 
recognition systems. Any system to be used in a 
command center environment will more than likely have 
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female as well as male operators. Thus, one of the 
main objectives of this study was to compare the 
error rates of the machine using operators of 
both sexes . 

2. Officer operators versus enlisted operators: 

Another group of subjects that has had little 
documented experience with the voice recognition 
system is that of enlisted personnel. Seemingly, 
there should be no difference between officer and 
enlisted. However, this assumption has not been 
tested. The likely candidate for use of the voice 
recognition system in the command center environment 
would be the enlisted member of the watch team. 
(Hopefully, the ease of use introduced by voice access 
would change this!) The emphasis in this study was 
in the use of operational personnel. The intent was 
to be realistic in the experience levels of the 
proposed operators in order to provide a true picture 
of the adaptability of the operators to the equipment 
and the training required for them to use the 
equipment . 

3. Three, five, or ten training passes to train the 
voice recognition system: 

The accepted algorithm used to train the voice 
recognition system in this experiment requires ten 
training passes to "learn" to recognize the operator's 
utterance. In an extensive vocabulary this can demand 
a considerable amount of time and can conceivably 
introduce errors in the training process if boredom 
and/or fatigue take over. There is an algorithm 
available to train using five or three utterances as 
well as ten. The final area examined was the use 
of three or five training passes vice ten. 
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I I . METHOD 



A. DESIGN 

Figure 1 shows the conceptual design for this experiment. 
It is a three-way nested hierarchal analysis of variance. 

Each of the four groups -- male enlisted, male officer, 
female enlisted, female officer -- consists of ten subjects. 
Each subject trained and tested the voice recognition system 
using three, five and ten training passes in a random order. 

B. SUBJECTS 

Forty active duty military volunteers participated in 
this study. There were ten female officers, ten female 
enlisted, ten male officers and ten male enlisted. 

The enlisted subjects were all Navy members stationed 
at the Naval Postgraduate School. Their ranks ranged from 
El to E8. Their rates were: Religious Program Specialist, 

Yeoman, Personnelman , Mess Management Specialist, Intelligence 
Specialist, Data Processor, Storekeeper, Air Intercept 
Controller, Electronics Technician (including fire control 
specialist) . 

The officers were from three U.S. services -- Navy, Army, 
Air Force -- and the Canadian Forces. They ranged in grade 
from 03 to 05. All but two were NPS students in the C3, 
Operations Research, Telecommunications Management, 
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FIGURE 1. CONCEPTUAL DESIGN OF EXPERIMENT 
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Intelligence, Personnel Management and Communications 
Engineering curricula. The other two were an Army chemical 
officer from Fort Ord and an Air Force navigator stationed 
at the Joint Chiefs of Staff. The backgrounds of the officers 
were: special warfare, National Oceanic and Atmospheric 

Administration, ADP , intelligence, telecommunications, 
cryptology, acquisition, aviator, aerospace engineering, 
management analysis and communications. 

Based on a questionnaire given to each subject before 
performing the exercise, all but four thought voice input 
would be easier and less frustrating than typing as a means 
of input to the computer. Sixteen of the forty subjects 
had used or seen voice input used but only two had more 
than an introduction to voice response systems. 

C. EQUIPMENT 

The equipment used in this research was a Threshold 
Technology, Incorporated, Model T600 discrete utterance 
voice recognition system which was located inside an 
Industrial Acoustic Company sound reduction chamber. The 
microphone used was a Shure SM10 head microphone. 

The Model T600 consists of four basic components (see 
Figure 2) : 

-- preprocessor unit consisting of an analog speech 
preprocessor and a digital input/output interface. 

-- operator console/microphone preamplifier. 
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FIGURE 2. EQUIPMENT SET-UP 



-- tape cartridge unit. 

-- CRT display and console. 

The preprocessor accepts the speech from the microphone 
preamplifier, extracts speech parameters and converts these 
to digital signals which are processed by the microcomputer. 

The microcomputer compares the input signals with stored 
reference patterns to determine which, if any, of the vocabu- 
lary words were spoken. If a close match is found between 
the input speech pattern and one of the reference patterns, 
a user defined character string is sent to the user's device 
via the output interface. If no match is found the system 
emits a "beep" sound. 

The reference patterns are generated during the "training 
mode" which requires a speaker to repeat several repetitions 
of each utterance with a variety of inflections as would be 
used in normal speech. The number of repetitions required 
is usually ten but for this experiment additional logic was 
added to the T600 to allow the use of three or five repeti- 
tions. An utterance can be a single word ("grid") or group 
of words ("command and control") lasting from a tenth of a 
second to two seconds . The only requirement is that the 
utterance contain no pauses of a tenth of a second or 
greater. If a tenth of a second pause is made, the T600 
will treat the sound as two utterances instead of the intended 
one. Up to 256 utterances are allowed on this system [Ref. 10], 
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Each utterance processed by the T600 is passed through 
nineteen bandpass filters which span the speech spectrum. 

The overall signal spectral shape is then described using 
a spectral shape detector which calculates the rate of change 
of energy level with respect to frequency. The spectral 
shape and its changes over time are calculated every two 
milliseconds to determine the presence or absence of thirty- 
two acoustic features. When the end of the utterance is 
detected, the duration of the utterance is divided into 
sixteen time segments and reconstructed into a normalized 
time base. The T600 extracts a 512-bit feature matrix -- 32 
binary features by 16 time features -- for each version of 
an utterance. Then all matrices (three, five or ten) are 
combined to produce a single reference matrix for an element. 

When an utterance is spoken for recognition by the T600 
a 512-bit descriptive matrix is calculated and weighted 
correlations between this matrix and each reference matrix 
describing the vocabulary utterances are calculated. The 
vocabulary with the largest correlation exceeding some preset 
threshold value is then selected as the utterance spoken. 

If no correlation exceeds the preset threshold value the 
T600 emits a "beep" sound [Ref. 11] . 

The T600 has a magnetic tape cartridge unit which allows 
the user to build his vocabulary reference patterns and store 
them on a tape cartridge. When the subject wants to use the 
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equipment, the tape is loaded into the preprocessor unit. 

This also allows a user to build a vocabulary for different 
tasks. He can then load the voice patterns for the task 
he needs to execute. Since the operator is not dependent 
on any large computer to store his voice patterns, the equip- 
ment can easily be moved and still be operational. 

D. PROCEDURE 

At the beginning of the session, subjects were given a 
questionnaire regarding their opinions on voice input versus 
manual typing. (See Appendix A.) The objectives of the 
experiment were explained along with an introduction to the 
voice recognition equipment used and the procedure to be 
followed. The subject was then seated in a controlled 
acoustical environment chamber in front of a video display 
and given instructions on how to train the equipment. (See 
Appendix B.) 

The vocabulary used in this test consisted of fifty 
utterances -- words and phrases -- varying in length from 
one to five syllables. The utterances were not chosen to 
test the machine's ability to distinguish between similar 
sounds -- "get" and "met," for example. The only considera- 
tion in choosing the vocabulary was to have the same number 
of utterances in each syllable category -- ten one -syllable 
words, ten two-syllable words, etc. The vocabulary list 
is shorn in Appendix C. Appendix D contains the Confusion 
Matrix . 
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Once the subject was introduced to the experiment and 
equipment, the head mike was mounted and the subject began 
training the fifty-word vocabulary using either three, five 
or ten training passes. The number of training passes used 
first was randomly determined so that each would be used 
first the same number of times. That is, one-third of the 
subjects started out using ten training passes. Another third 
used three training passes first and the last third started 
out using five training passes. 

The training procedure involved repeating an utterance 
the required number of times and then testing the equipment 
by repeating the utterance two or three times. If the 
machine did not respond correctly two out of three times 
the utterance was retrained. Once the entire vocabulary 
was trained, the subject tested the equipment by reading 
through the vocabulary list twice (100 utterances). Any 
"beeps'* or incorrect responses were noted by the experimenter. 
This entire procedure was repeated using a different number 
of training passes until each subject had trained and tested 
the equipment using three, five and ten training repetitions. 
Subjects were allowed to rest, ask questions, get a drink 
at any time during the procedure. 

E. DEPENDENT VARIABLES 

After the training session each subject read through the 
list of words two times. A record was kept of each time the 
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machine responsed with a "beep" or an incorrect utterance. 
A record was also kept of the time each subject took to 
complete the experiment. 
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III. ANALYSIS AND RESULTS 



A. HYPOTHESES 

The following hypotheses were to be tested: 

1. Hypothesis regarding male and female subjects. 

H n : "There is no difference between male and female 

users of the voice recognition system." 

H^ : "The null hypothesis is false." 

2. Hypothesis regarding officer and enlisted subjects. 

H_ : "There is no difference between officer and 

enlisted users of the voice recognition system." 

H^ : "The null hypothesis is false." 

3. Hypothesis regarding number of training passes. 

H n : "There is no difference in recognition accuracy 

when a different number of training passes is 
used in the voice recognition system." 

H^ : "The null hypothesis is false." 

B. RESULTS FOR SEX 

The results of this experiment for male and female 
subjects are shown graphically in Figure 3. The machine's 
performance for men was slightly better than for women -- 1.8 
error rate for men versus 2.11 for women based on twenty 
subjects making 6000 utterances in each sex category. 

However, the analysis of variance (ANOVA) results in Table I 
show an F ratio of .45 which indicates no significant statisti- 
cal difference in the gender of the operator. Thus the null 
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MEAN PERCENTAGE ERROR RATE 




FIGURE 3. ERRORS VS. SEX 
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TABLE I 



ANALYSIS OF VARIANCE 



SOURCE 


SS 


df 


MS 


F 


Total 


3.1013 


119 






Between Subjects 


1.6172 


39 






Male/Female 


.0199 


1 


.0199 


.4584 


Enlis ted/ Of f icer 


.0183 


1 


.0182 


.4217 


Sex x Rank 


.0197 


1 


.0197 


.4552 


Error (B) 


1.5594 


36 


.0433 




Within Subjects 


1.4841 


80 






Training Passes 


.2835 


2 


.1418 


9.1427 


Training Passes 


.0330 


2 


.0165 


1.0650 


x Sex 


Training Passes 


.0197 


2 


.0983 


6.3396 


x Rank 


Training Passes 


.0314 


2 


.0157 


1.0129 


x Sex x Rank 


Error (W) 


1.1165 


72 


.0155 





SS - sum of squares 
dF - degrees of freedom 
MS - mean square 
F - F ratio 

? - probability of error 
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hypothesis is not rejected. This result speaks highly 
for the algorithm used by Threshold. It would appear they 
have a good handle on the additional requirements needed 
to process the female voice. 

This result further establishes the possibility of using 
a voice recognition system in a command center environment. 

The highest probability of error occurred with female subjects 
but even then the mean percentage error was only 2.1%. That 
is, out of one hundred utterances (an utterance, again, 
being a single word or group of words) spoken by a female 
watch team member to the computer, all but three would be 
interpreted correctly. If these utterances were being typed, 
a greater probability of error would exist since one 
utterance could have as many typing errors as there are 
characters in the utterance. 

C. RESULTS FOR RANK -- OFFICER VS. ENLISTED 

Figure 4 shows the comparison of machine errors for the 
two categories of officer and enlisted. The machine's 
performance for the enlisted was slightly better than for 
officers -- 1.85% versus 2.05% mean error percentage based 
on twenty subjects making 6000 utterances in each rank 
category. 

However, the statistical results from the ANOVA (Table I) 
show an F ratio of .42. Therefore, there is no significant 
statistical difference in the error rate of the T600 when 
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FIGURE 4. ERRORS VS. RANK 
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used by officer or enlisted personnel. Based on these 
statistics, the use of a voice system should be favorable 
to either military member of the watch team. 

D. RESULTS FOR NUMBER OF TRAINING PASSES -- THREE, FIVE 

OR TEN 

Figure 5 shows the relationship between number of 
training passes and rank. Figure 6 shows the relationship 
between number of training passes and sex. In each case 
the percentage of error for training the T600 with five or 
ten training passes is about the same -- around 1% error 
for both ranks and both sexes. However, the percentage 
of error using three training passes is significantly 
higher -- around 2.7% based on rank and 2.4% to 3% based on 
sex . 

This graphical interpretation is proven statistically 
in the ANOVA with a significance level of .01. That is, the 
F ratio is 9.14 which is well above the 4.79 required for 
an alpha level of .01. Based on the F ratio, the null 
hypothesis is rejected. Therefore, there is a significant 
difference in recognition accuracy of the T600 when a differ- 
ent number of training passes is used. A Duncan Range test 
was performed to verify that the difference in performance 
was between three training passes and five or ten training 
passes. Five and ten passes had about the same probability 
of error. Even though three training passes has a 
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FIGURE 6. NUMBER OF TRAINING PASSES VS. SEX 
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significantly higher percentage of error over the five and 
ten passes, it is still only a 3 % error rate. 

The ANOVA also showed a significant interaction (alpha 
level less than .01) between the number of training passes 
used and the rank of the subject. This would imply that an 
enlisted user would have a lower error rate if he trained 
the system using five training passes and an officer user 
would get better recognition if he used ten training passes. 

A t-test was performed to determine if five and ten passes 
for officers and five and ten passes for enlisted were 
indeed different since this interaction seemed unrealistic. 

The t-test showed both t-statistics (.7682 for women officers 
and -1.3125 for enlisted women) were within the 95% acceptance 
region. Therefore, the t-test shows there is no difference 
in error rate when using five or ten training passes for 
either officer or enlisted category. 

A possible explanation for enlisted performance being 
lower with ten training passes is that five passes allowed 
enough variation to build a good identity matrix and ten 
training passes invited such a degree of boredom that the 
performance was degraded. 

It is interesting to note although the manufacturer 
recommends ten training passes for the best performance of 
the system, the results of this study show no significant 
difference between five and ten training passes. This 
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result might only apply when a relatively small vocabulary 
is used but in a crisis situation this could suggest the 
use of five training passes to get a needed vocabulary on 
tape quickly. As one's experience with the T600 increases, 
the use of fewer training passes may be sufficient. 

The order in which subjects trained the equipment with 
the different number of training passes was randomly assigned 
to prevent any biases in case learning or fatigue factors 
were involved. Figure 7 shows the percent error rate versus 
number of training passes used in the order subjects trained. 
That is, for all subjects who started out the experiment 
using three training passes, the percent error rate was 2.3 
For all subjects who used five training passes first, the 
percent error rate was 2 %. Those subjects who used three 
training passes after training with five and ten passes 
had a percent error rate of 2.9%. 

If an improvement due to experience was a factor then 
five training passes was the only one which demonstrated 
this. However, the increase in errors as three training 
passes was used second and third could be due to the fact 
that subjects became accustomed to putting a lot of inflec- 
tions in the utterances and when only three passes was used, 
they ran out of training passes before running out of in- 
flections. The increase in errors when ten training passes 
was used last could easily be explained as the fatigue 
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factor. Most subjects took twice as long to train the fifty- 
word vocabulary using ten training passes as they did using 
three passes. By the time they were training and testing 
for the third time the novelty had begun to wear off and 
voices were getting tired. 

A correlation was run on three passes versus five passes, 
five versus ten and three versus ten to see if a subject who 
performed well on three training passes did better with five 
and ten passes. Only the results of the three-five corre- 
lation, .67, are significant at .05. The five-ten correla- 
tion was .23 and the three-ten correlation was .11. Neither 
of these is significantly close to 1 or -1 and, therefore, 
little correlation is evident for these two cases. 

E. RESULTS FOR NUMBER OF UTTERANCE SYLLABLES -- 1, 2, 3, 

4, 5 

Figures 8 through 10 show the error recognition rate 
for the number of training passes versus the number of syllables 
in the utterance. In Figure 8, using three training passes, 
the T600 misinterpreted one-syllable utterances (words 0 
through 4 and 25 through 29 in Appendix C) 28 times out of 
800 utterances (40 subjects x 10 utterances x 2 repetitions 
for each utterance) for a percentage error rate of 3.5%. 

With one exception the percentage error rate decreased as 
the number of syllables increased for all three training 
matrices. This seems reasonable since a greater number of 
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syllables give the T600 more unique data to build a recog- 
nition matrix for the utterance. The exception for both 
three and five passes is two syllables. That is, the 
percentage error rate decreases for utterances from one to 
five syllables with the exception of two syllables where 
the error rate is greatest. In the case of ten training 
passes, the exception is three-syllable utterances, with 
one syllable having the greatest error rate. 

The percentage error rate for five training passes is 
significantly better than three in all syllable categories. 
With the exception of two and five syllables it is also 
better than ten training passes. The best system performance 
was using five syllable utterances and ten training passes. 
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IV. DISCUSSION AND CONCLUSIONS 



The main points brought out in the previous results 
section showed that: 

1. There was no difference in error rates among the 
categories of officer and enlisted users of the 
voice recognition system. 

2. There was no difference in error rates among the 
categories of female and male users of the system. 

3. There was a significant difference in error rates 
of all categories when using three training passes 
vice five or ten passes but the five and ten training 
passes had the same error rates. 

4. There was significant interaction between rank 
and the number of training passes used. 

Based on these results there should be no problem 
technically or psychologically with the use of voice 
recognition systems by military men and women, officer 
or enlisted. Although this experiment was conducted in 
a sound reduction chamber, there are two T600 voice recog- 
nition systems located in the C3 Laboratory at the Naval 
Postgraduate School which are frequently in use. The C3 
Laboratory simulates the environment of a command center. 
There have been no problems with background noise in the 
use of this voice system. Professor R. Elster [Ref. 12] 
found similar results with his study on The Effects of 
Certain Background Noises on the Performance of a Voice 
Recognition System. 
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The enthusiasm and ease with which the subjects used 
and trained the equipment are positive signs for the 
successful use of voice recognition systems in command centers. 
At the time of this writing, a T600 system has been placed 
in the command center at Commander in Chief Pacific Fleet 
(CINCPACFLT) . During the week of 1 December 1980, Dr. Gary 
Poock and LT Ellen Roland of the Naval Postgraduate School 
faculty gave a demonstration of the T600 voice recognition 
system to CINCPACFLT. That staff now has a T600 in the 
command center which is being experimented with in a variety 
of areas . 
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APPENDIX A 



SUBJECT QUESTIONNAIRE AND ANSWER SHEET 

Please answer the following questions with respect to 
your capabilities . 

For items 3-7 designate your feelings from strong 
feeling for manual input (far left box) , no strong feeling 
either way (middle box) , strong feeling for voice input 
(far right box) . 

For items 8 and 9, designate your feelings from strong 
feelings in favor (far right box) , no strong feelings either 
way (middle box) , strong feeling against (far left box) . 

1. Have you ever used voice input? 

2. Have you ever seen voice input used? 

3. Which might be easier, manual typing input or voice 
input for communicating with a computer? 

4. Would you be more relaxed using manual typing input 
or voice input? 

5. Would you have more flexibility in entering items to a 
computer with voice input or manual typing input? 

6. Would voice input or manual typing allow you more time 
and freedom to do other things? 

7. Would you be more frustrated using voice input or 
manual typing? 
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8 . 



9 . 



In general, do 
In general, do 
input in every 



you like the idea of voice input? 
you think you would like to use voice 
day tasks yourself if it were applicable? 
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APPENDIX B 



INSTRUCTIONS TO SUBJECTS 

The fifty-word vocabulary being used with the voice 
recognizer in the experiment is attached to these instruc- 
tions. You will be required to repeat each word of this 
vocabulary three, five and ten times to train the recognizer 
to recognizer your particular patterns of each word. To 
facilitate recognition by the voice recognizer, you should 
include in the repetitions as many as possible of the 
different ways you might say the word in normal speech; for 
example, use different intonations and emphasis, and small 
variations in volume. 

In order to keep track of the number of times you 
say each word when using ten repetitions and to reduce 
breath noise, it is best to speak the ten repetitions in 
several groups. For example, if the word is zero, it is 
better to group them as: 

000 - 000 - 0000 
or 

000 - 000 - 000 - 0 
rather than 

0000000000 . 
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Please observe the following guidelines while inputting 
voice data to the recognizer. 

-- Speak each word crisply and quickly but do not 
overpronounce . 

-- Leave a distinct pause (specifically, at least one- 
tenth of a second of silence) between each word so 
that the recognizer can distinguish the end of one 
word from the beginning of the next. Do not leave 
a period of silence within a word or the recognizer 
will mistake it for two separate words. 

-- Avoid breathing into the microphone at the end of 
words as this will generate false inputs to the 
recognizer . 
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0 
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APPENDIX C 



VOCABULARY 



UTTERANCE 


WORD 


GRID 


25 


LAUNCH 


26 


COURSE 


27 


GOLF 


28 


SPEED 


29 


MESSAGE 


30 


ORDERS 


31 


PLATFORM 


32 


SENSOR 


33 


MISSILE 
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COLORADO 
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CONNECT TO CHARLIE 
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COMMAND AND CONTROL 
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CONTINUOUS SPEECH 
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VOICE TECHNOLOGY 
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UTTERANCE 

FIRE 

TIME 

MAP 

SCOPE 

MAINE 

NEUTRAL 

REFUEL 

WHISKEY 

LIMA 

LOGOUT 

TRACK UNKNOWN 

LONGITUDE 

TORPEDO 

BLUE FORCE ONE 
ROMEO 

FLIGHT CONTROLLER 
SEA OF JAPAN 
HONOLULU 
ADVANTAGES 
CONTINUOUS 

TASK FORCE COMMANDER 
NORTH CAROLINA 
BEARING AND DISTANCE 
PLOT ALL SUBMARINES 
UNITED AIR LINES 
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